Modeling Communicative Purpose with Functional Style: Corpus and Features for German Genre and Register Analysis
نویسندگان
چکیده
While there is wide acknowledgement in NLP of the utility of document characterization by genre, it is quite difficult to determine a definitive set of features or even a comprehensive list of genres. This paper addresses both issues. First, with prototype semantics, we develop a hierarchical taxonomy of discourse functions. We implement the taxonomy by developing a new text genre corpus of contemporary German to perform a text based comparative register analysis. Second, we extract a host of style features, both deep and shallow, aiming beyond linguistically motivated features at situational correlates in texts. The feature sets are used for supervised text genre classification, on which our models achieve high accuracy. The combination of the corpus typology and feature sets allows us to characterize types of communicative purpose in a comparative setup, by qualitative interpretation of style feature loadings of a regularized discriminant analysis. Finally, to determine the dependence of genre on topics (which are arguably the distinguishing factor of sub-genre), we compare and combine our style models with Latent Dirichlet Allocation features across different corpus settings with unstable topics.
منابع مشابه
The Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks
Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...
متن کاملExploring the Use of Linguistic Features in Domain and Genre Classification
The central questions are: How useful is information about part-of-speech frequency for text categorisation? Is it feasible to limit word features to content words for text classifications? This is examined for 5 domain and 4 genre classification tasks using LIMAS, the German equivalent of the Brown corpus. Because LIMAS is too heterogeneous, neither question can be answered reliably for any of...
متن کاملConstitutive Features of the Russian Political Discourse in Ecolinguistic Aspect
The article offers a comparative description of typological mechanisms used in political communicative practice and methods of verbal explication of its axiological and symbolic constituents determining universal mental features of individual/collective consciousness. The research position based on a systemic multilevel analysis of the component structure of discourse facilitates the identifica...
متن کاملA Genre Analysis of Persian Research Article Abstracts: Communicative Moves and Author Identity
Most studies within the area of genre analysis, particularly those conducted in Iran, have exclusively used text analysis. While such investigations have led to important understandings of generic features of texts, it can be argued that incorporating interview data for triangulation can lead to better understanding of generic features of texts. Along this line, this paper reports the results o...
متن کاملApplying Multi-Dimensional Analysis to a Russian Webcorpus: Searching for Evidence of Genres
The paper presents an application of Multidimensional (MD) analysis initially developed for the analysis of register variation in English (Biber, 1988) to the investigation of a genre diverse corpus, which was built from modern texts of the Russian Web. The analysis is based on the idea that each linguistic feature has different frequencies in different registers, and statistically stable co-oc...
متن کامل